Storing multipart XML documents

ABSTRACT

A method for storing an XML document, the method including decomposing the XML document into a hierarchy of nodes and creating an index of the nodes.

REFERENCE TO RELATED APPLICATION

The present disclosure is based on and claims the benefit of ProvisionalApplication Ser. No. 60/573,513 filed May 21, 2004, the entire contentsof which are herein incorporated by reference.

BACKGROUND

1. Technical Field

The present disclosure relates to XML documents and, more specifically,to storing multipart XML documents.

2. Description of the Related Art

Extensible Markup Language (XML) is a popular markup language capable ofdescribing many different types of data, by which software applicationscan communicate with one another and with human users. XML is a humanreadable markup language. XML instructions can be read and understood bya human, in contrast to, for example, computer languages that store andsend information as binary data.

Because XML is a human readable, platform independent standard, dataformulated in XML may be read and interpreted by any computer utilizingany platform, and is easily manipulated and tested by softwaredevelopers. The ease of use of XML across many different hardware andsoftware environments makes it a popular choice for communicationbetween modern computer software, especially web services basedapplications. The growing popularity of web services based applicationsand XML has created a growing demand for new and useful methods to makethe programming of web based applications quicker, easier and moreefficient. The Apache Cocoon Project (Cocoon) is one example of such asystem. Cocoon is a web development framework built around the conceptof separation of concerns and component-based web development. Cocoonseeks to provide a framework by which a set of components may beprogrammed where each component provides an isolated function. Thesecomponents may then be used as building blocks for the development ofmore complex components and/or complete web services based applications.Using the cocoon framework, users may be able to hook together a seriesof pre-developed components to form a web based application withoutneeding to engage in the minute details of computer programming. Cocoonalso allows users to extract added benefit from programmed components byreusing them in a wide range of web applications without reprogrammingwhat has previously been programmed.

XML document tools have been developed for the storing and searching ofdata. For example, Lightweight Directory Access Protocol (LDAP) providesfor the accessing of on-line directory services. As part of the cocoonframework, a method is provided for the storing and searching of XMLdocument data. Cocoon provides for a database implementation that allowsfor the manual indexing of particular elements of an XML document,allowing for the XML document to be located during a search. However,cocoon's ability to search for particular subdocuments of an XMLdocument is very limited.

Other methods for searching through XML documents exist. For example,XML documents may be searched through linearly, one after another.However, such a non-indexed search can take a long time, especially inlight of the long length of XML documents attributable to their humanreadable nature.

SUMMARY

A method for storing an XML document, the method including decomposingthe XML document into a hierarchy of nodes, and creating an index of thenodes.

A method for searching for an XML document, the method includingsearching for the XML document using an index that has been created bydecomposing one or more XML documents into a hierarchy of nodes.

A system for storing an XML document, includes a decomposing unit fordecomposing the XML document into a hierarchy of nodes, and a creatingunit for creating an index of the nodes.

A system for searching an XML document, includes a searching unit forsearching the XML document using an index that has been created bydecomposing one or more XML documents into a hierarchy of nodes.

A computer system includes a processor and a computer recording mediumincluding computer executable code executable by the processor forstoring an XML document. The computer executable code includes code fordecomposing the XML document into a hierarchy of nodes, and code forcreating an index of the nodes.

A computer system includes a processor and a computer recording mediumincluding computer executable code executable by the processor forsearching an XML document. The computer executable code includes codefor searching the XML document using an index that has been created bydecomposing one or more XML documents into a hierarchy of nodes.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete appreciation of the present disclosure and many of theattendant advantages thereof will be readily obtained as the samebecomes better understood by reference to the following detaileddescription when considered in connection with the accompanyingdrawings, wherein:

FIG. 1 shows an example of an XML document that may be stored andsearched according to an embodiment of the present disclosure;

FIG. 2 shows a diagram of how the example XML document that may bestored and searched according to an embodiment of the present disclosureis organized;

FIG. 3 shows how the example XML document may be decomposed according toan embodiment of the present disclosure;

FIG. 4 shows the text of the nodes that the example XML document may bedecomposed into according to an embodiment of the present disclosure;

FIG. 5 shows a flowchart illustrating an embodiment of the presentdisclosure; and

FIG. 6 shows an example of a computer system capable of implementing themethod and apparatus according to embodiments of the present disclosure.

DETAILED DESCRIPTION

In describing the preferred embodiments of the present disclosureillustrated in the drawings, specific terminology is employed for sakeof clarity. However, the present disclosure is not intended to belimited to the specific terminology so selected, and it is to beunderstood that each specific element includes all technical equivalentswhich operate in a similar manner.

The present disclosure describes a system and method for indexing andsearching through stored XML documents and XML sub documents. Accordingto embodiments of the present disclosure, searching through stored XMLdocuments and sub documents may be quick and may minimize processoroverhead, for example, it may be computationally efficient. According toembodiments of the present disclosure, search results may be returned asthe actual text of the XML document. This may offer advantages, forexample, the retrieval of actual XML text may facilitate XMLcryptography.

FIG. 5 shows a flowchart illustrating an embodiment of the presentdisclosure. First, an XML document may be provided (Step S51). FIG. 1shows an example of an XML document 11 that may be provided to be storedand searched according to an embodiment of the present disclosure. TheXML document 11 defines “presents” as including a dragon toy that has agreen color and a fluffy type, a horse toy that has a green color and ahairy type, and chocolate food. Embodiments of the present disclosurewill be described with respect to this example XML document. Of course,the same methods and systems may be applied, more generally, to any XMLdocument or document.

FIG. 2 shows a diagram of how the example XML document 11 to be storedand searched according to an embodiment of the present disclosure isorganized. The complete XML document 21 defines presents. Within thisdocument 21, three sub documents 22, 23, 24 may be found. A sub documentis a portion of a document that may be hierarchically below a documentor sub document and/or may be capable of independent function. The firstsub document 22 defines a dragon. The dragon sub document 22 is definedas a toy having a green color 25 and a fluffy type 26. The green color22 and fluffy type 26 are themselves sub documents that defineindividual characteristics. For example, the green color sub document 25defines green as a color.

The second sub document 23 defines a horse. The horse sub document 23 isdefined as a toy having a blue color 27 and a hairy type 28. The bluecolor 27 and the hairy type 28 are themselves sub documents that defineindividual characteristics.

The third sub document 24 defines chocolate. The chocolate sub document24 is defined as a food but has no sub documents defined within it.

After an XML document has been provided (Step S51), the provided XMLdocument may be decomposed (Step S52). Decomposing an XML document maybe the process of dividing the program into discrete units that can besearched. FIG. 3 shows how the example XML document 11 may be decomposedaccording to an embodiment of the present disclosure. According to anembodiment of the present disclosure, an XML document may be decomposedinto one or more nodes. FIG. 4 shows the text of the nodes that theexample XML document may be decomposed into according to an embodimentof the present disclosure. These nodes may be hierarchical whereby aparent node may have one or more children nodes. The first nodegenerally represents the entire XML document.

Along with the XML document or sub document text, each node may containa unique directory identifier. The unique directory identifier may lista node number as well as node numbers for the parent of the node and allother ancestors, where they may exist. Each node may contain a nodeelement name. The node element name may be, for example, a name for thenode. This name, for example, may be used to identify the node in anindex (directory) that may be created to facilitate searches. Each nodemay contain one or more node attributes. These attributes may be a listof any characteristics of the node that may facilitate searching forthat particular document or sub document. The inclusion of nodeattributes is an optional feature of the present disclosure and otherembodiments of the present disclosure do not use node attributes. Wheredesired, node attributes may be manually entered by a user orautomatically generated by analyzing the text of the document or subdocument. Additionally, other values may be stored as part of the node.For example, other values may be stored to facilitate searching.

The first node 31, according to the example XML document 11, representsthe entire example XML document 11. The hierarchical position of thisnode 31 is shown in FIG. 3. The full text 41 of the first node 31 isshown in FIG. 4. In addition to containing the full text of the XMLdocument 11, the first node 31 contains a node element name, here“Node1” and a unique directory identifier, here Element=1. Because onlyone element is listed as the unique directory identifier, it is clearthat this node represents the top of the hierarchy, for example, thefull XML document.

Complete XML documents, for example, nodes representing the top of thehierarchy, may list the full XML document as the program text as shownin FIGS. 3 and 4. XML sub documents, for example, nodes that may be in aposition other than the top of the hierarchy, for example, children ofthe complete XML document or children of other sub documents, may listthe XML document section that defines that sub document including allsub documents of that sub document, if any should exist, as well as allother sub documents that may be descendents of that sub document.

For example, the second node 32 represents a sub document of the exampleXML document 11 that defines the toy dragon. The text 42 of the secondnode 32 is shown in FIG. 4. In addition to including the XML text thatdefines the dragon as a toy, the text of the second node 32 includes theXML text that defines the color green, which is its self a node 35 andan XML sub document, and the XML text that defines the type fluffy,which is its self a node 36 and an XML sub document.

In addition to including the XML text of the toy dragon sub document,the text 42 of the second node 32, contains a node element name, here“Node2” and a unique directory identifier, here Element=2, Element=1.Because two elements are listed as the unique directory identifier, itis clear that this node is the second node and is a child of the firstnode on the hierarchy.

Likewise the third node through the eighth node 33-38 all contain thedefinitions of their particular sub document along with all of the subdocuments that may be their respective descendents. The text of thesesub documents 43-48 respectively contains the XML text that definestheir particular sub document along with all of the sub documents thatmay be their respective descendents along with their respective nodeelement names and unique identifiers.

After an XML document has been decomposed into nodes representing theXML document and all sub documents as a hierarchy of nodes as describedabove (Step S52), each node may be stored in a hierarchical directory ofXML documents and sub documents. A searchable index may be created tofacilitate searching of the directory (Step S53). Each node may beindexed according to its attributes. For example, each node may beindexed according to its XML text, unique directory identifier, nodeelement name, and/or any other attributes that may be listed within eachnode. For example, each node may be indexed according to its hierarchy.

The storing of the XML document and sub document nodes and the creationof an index may be carried out according to known techniques for storinginformation to a directory and creating a searchable index. For example,the techniques used for handling LDAP directories may be used to handlethe creation of a searchable index.

The above directory may be searched for a particular XML document or subdocument (Step S54). To facilitate the search, the created searchableindex may be used. After the search has been performed, search resultsmay be returned (Step S55). For example, the full XML text of thedocuments and/or sub documents resulting from the search may bedisplayed.

FIG. 6 shows an example of a computer system which may implement themethod and system of the present disclosure. The system and method ofthe present disclosure may be implemented in the form of a softwareapplication running on a computer system, for example, a mainframe,personal computer (PC), handheld computer, server, etc. The softwareapplication may be stored on a recording media locally accessible by thecomputer system and accessible via a hard wired or wireless connectionto a network, for example, a local area network, or the Internet.

The computer system referred to generally as system 1000 may include,for example, a central processing unit (CPU) 1001, random access memory(RAM) 1004, a printer interface 1010, a display unit 1011, a local areanetwork (LAN) data transmission controller 1005, a LAN interface 1006, anetwork controller 1003, an internal buss 1002, and one or more inputdevices 1009, for example, a keyboard, mouse etc. As shown, the system1000 may be connected to a data storage device, for example, a harddisk, 1008 via a link 1002.

The above specific embodiments are illustrative, and many variations canbe introduced on these embodiments without departing from the spirit ofthe disclosure or from the scope of the appended claims. For example,elements and/or features of different illustrative embodiments may becombined with each other and/or substituted for each other within thescope of this disclosure and appended claims.

1. A method for storing an XML document, the method comprising:decomposing the XML document into a hierarchy of nodes; and creating anindex of the nodes.
 2. The method of claim 1, wherein decomposing theXML document into a hierarchy of nodes comprises storing the XMLdocument as a node at a highest level of the hierarchy of nodes.
 3. Themethod of claim 2, wherein decomposing the XML document into a hierarchyof nodes comprises storing one or more portions of the XML document asnodes of the hierarchy of nodes, wherein each of the nodes of thehierarchy of nodes that is not the highest level of the hierarchy ofnodes is a portion of the node immediately above it in the hierarchy ofnodes.
 4. The method of claim 2, wherein decomposing the XML documentinto a hierarchy of nodes comprises storing one or more first portionsof the XML document as nodes at a second-highest level of the hierarchyof nodes.
 5. The method of claim 4, wherein decomposing the XML documentinto a hierarchy of nodes comprises storing one or more second portionsof the XML document which are also a portion of the corresponding firstportion of the XML document, as nodes at a third-highest level of thehierarchy of nodes.
 6. The method of claim 5, wherein decomposing theXML document into a hierarchy of nodes comprises storing one or moresubsequent portions of the XML document which are also portions of thecorresponding portion of the XML document directly above thecorresponding one or more subsequent portions of the XML document in thehierarchy of nodes, as a node at a level of the hierarchy immediatelybelow the corresponding one or more subsequent portions of the XMLdocument.
 7. The method of claim 1, wherein each node of the hierarchyof nodes contains a unique directory identifier.
 8. The method of claim1, wherein each node of the hierarchy of nodes contains a node elementname.
 9. The method of claim 1, wherein each node of the hierarchy ofnodes contains one or more node attributes.
 10. A method for searchingan XML document, the method comprising: searching the XML document usingan index that has been created by decomposing one or more XML documentsinto a hierarchy of nodes.
 11. The method of claim 10, wherein one ormore of the one or more XML documents is a sub document.
 12. The methodof claim 10, wherein decomposing an XML document of the one or more XMLdocuments into a hierarchy of nodes comprises storing the XML documentas a node at a highest level of the hierarchy of nodes.
 13. The methodof claim 12, wherein decomposing the XML document into a hierarchy ofnodes comprises storing one or more portions of the XML document asnodes of the hierarchy of nodes wherein each of the nodes of thehierarchy of nodes that is not the highest level of the hierarchy ofnodes is a portion of the node immediately above it on the hierarchy ofnodes.
 14. The method of claim 12, wherein decomposing the XML documentinto a hierarchy of nodes comprises storing one or more first portionsof the XML document as nodes at a second-highest level of the hierarchyof nodes.
 15. The method of claim 14, wherein decomposing the XMLdocument into a hierarchy of nodes comprises storing one or more secondportions of the XML document which are also a portion of thecorresponding first portion of the XML document, as nodes at athird-highest level of the hierarchy of nodes.
 16. The method of claim15, wherein decomposing the XML document into a hierarchy of nodescomprises storing one or more subsequent portions of the XML documentwhich are also portions of the corresponding portion of the XML documentdirectly above the corresponding one or more subsequent portions of theXML document in the hierarchy of nodes, as a node at a level of thehierarchy immediately below the corresponding one or more subsequentportions of the XML document.
 17. The method of claim 10, wherein eachnode of the hierarchy of nodes contains a unique directory identifier.18. The method of claim 10, wherein each node of the hierarchy of nodescontains a node element name.
 19. The method of claim 10, wherein eachnode of the hierarchy of nodes contains one or more node attributes. 20.A system for storing an XML document, comprising: a decomposing unit fordecomposing the XML document into a hierarchy of nodes; and a creatingunit for creating an index of the nodes.
 21. The system of claim 20,wherein decomposing the XML document into a hierarchy of nodes comprisesstoring the XML document as a node at a highest level of the hierarchyof nodes.
 22. The system of claim 21, wherein decomposing the XMLdocument into a hierarchy of nodes comprises storing one or moreportions of the XML document as nodes of the hierarchy of nodes whereineach of the nodes of the hierarchy of nodes that is not the highestlevel of the hierarchy of nodes is a portion of the node immediatelyabove it on the hierarchy of nodes.
 23. The system of claim 21, whereindecomposing the XML document into a hierarchy of nodes comprises storingone or more first portions of the XML document as nodes at asecond-highest level of the hierarchy of nodes.
 24. The system of claim23, wherein decomposing the XML document into a hierarchy of nodescomprises storing one or more second portions of the XML document whichare also a portion of the corresponding first portion of the XMLdocument, as nodes at a third-highest level of the hierarchy of nodes.25. The system of claim 24, wherein decomposing the XML document into ahierarchy of nodes comprises storing one or more subsequent portions ofthe XML document which are also portions of the corresponding portion ofthe XML document directly above the corresponding one or more subsequentportions of the XML document in the hierarchy of nodes, as a node at alevel of the hierarchy immediately below the corresponding one or moresubsequent portions of the XML document.
 26. The system of claim 20,wherein each node of the hierarchy of nodes contains a unique directoryidentifier.
 27. The system of claim 20, wherein each node of thehierarchy of nodes contains a node element name.
 28. The system of claim20, wherein each node of the hierarchy of nodes contains one or morenode attributes.
 29. A system for searching an XML document, comprising:a searching unit for searching the XML document using an index that hasbeen created by decomposing one or more XML documents into a hierarchyof nodes.
 30. The system of claim 29, wherein one or more of the one ormore XML documents is a sub document.
 31. The system of claim 29,wherein decomposing an XML document of the one or more XML documentsinto a hierarchy of nodes comprises storing the XML document as a nodeat a highest level of the hierarchy of nodes.
 32. The system of claim31, wherein decomposing the XML document into a hierarchy of nodescomprises storing one or more portions of the XML document as nodes ofthe hierarchy of nodes, wherein each of the nodes of the hierarchy ofnodes that is not the highest level of the hierarchy of nodes is aportion of the node immediately above it on the hierarchy of nodes. 33.The system of claim 31, wherein decomposing the XML document into ahierarchy of nodes comprises storing one or more first portions of theXML document as nodes at a second-highest level of the hierarchy ofnodes.
 34. The system of claim 33, wherein decomposing the XML documentinto a hierarchy of nodes comprises storing one or more second portionsof the XML document which are also a portion of the corresponding firstportion of the XML document, as nodes at a third-highest level of thehierarchy of nodes.
 35. The system of claim 34, wherein decomposing theXML document into a hierarchy of nodes comprises storing one or moresubsequent portions of the XML document which are also portions of thecorresponding portion of the XML document directly above thecorresponding one or more subsequent portions of the XML document in thehierarchy of nodes, as a node at a level of the hierarchy immediatelybelow the corresponding one or more subsequent portions of the XMLdocument.
 36. The system of claim 29, wherein each node of the hierarchyof nodes contains a unique directory identifier.
 37. The system of claim29, wherein each node of the hierarchy of nodes contains a node elementname.
 38. The system of claim 29, wherein each node of the hierarchy ofnodes contains one or more node attributes.
 39. A computer systemcomprising: a processor; and a computer recording medium includingcomputer executable code executable by the processor for storing an XMLdocument, the computer executable code comprising: code for decomposingthe XML document into a hierarchy of nodes; and code for creating anindex of the nodes.
 40. The computer system of claim 39, wherein thecode for decomposing the XML document into a hierarchy of nodescomprises code for storing the XML document as a node at a highest levelof the hierarchy of nodes.
 41. The computer system of claim 40, whereinthe code for decomposing the XML document into a hierarchy of nodescomprises code for storing one or more portions of the XML document asnodes of the hierarchy of nodes wherein each of the nodes of thehierarchy of nodes that is not the highest level of the hierarchy ofnodes is a portion of the node immediately above it on the hierarchy ofnodes.
 42. The computer system of claim 40, wherein the code fordecomposing the XML document into a hierarchy of nodes comprises codefor storing one or more first portions of the XML document as nodes at asecond-highest level of the hierarchy of nodes.
 43. The computer systemof claim 42, wherein the code for decomposing the XML document into ahierarchy of nodes comprises code for storing one or more secondportions of the XML document which are also a portion of thecorresponding first portion of the XML document, as nodes at athird-highest level of the hierarchy of nodes.
 44. The computer systemof claim 43, wherein the code for decomposing the XML document into ahierarchy of nodes comprises code for storing one or more subsequentportions of the XML document which are also portions of thecorresponding portion of the XML document directly above thecorresponding one or more subsequent portions of the XML document in thehierarchy of nodes, as a node at a level of the hierarchy immediatelybelow the corresponding one or more subsequent portions of the XMLdocument.
 45. The computer system of claim 39, wherein each node of thehierarchy of nodes contains a unique directory identifier.
 46. Thecomputer system of claim 39, wherein each node of the hierarchy of nodescontains a node element name.
 47. The computer system of claim 39,wherein each node of the hierarchy of nodes contains one or more nodeattributes.
 48. A computer system comprising: a processor; and acomputer recording medium including computer executable code executableby the processor for searching an XML document, the computer executablecode comprising: code for searching for the XML document using an indexthat has been created by decomposing one or more XML documents into ahierarchy of nodes.
 49. The computer system of claim 48, wherein one ormore of the one or more XML documents is a sub document.
 50. Thecomputer system of claim 48, wherein decomposing an XML document of theone or more XML documents into a hierarchy of nodes comprises storingthe XML document as a node at a highest level of the hierarchy of nodes.51. The computer system of claim 50, wherein decomposing the XMLdocument into a hierarchy of nodes comprises storing one or moreportions of the XML document as nodes of the hierarchy of nodes whereineach of the nodes of the hierarchy of nodes that is not the highestlevel of the hierarchy of nodes is a portion of the node immediatelyabove it on the hierarchy of nodes.
 52. The computer system of claim 50,wherein decomposing the XML document into a hierarchy of nodes comprisesstoring one or more first portions of the XML document as nodes at asecond-highest level of the hierarchy of nodes.
 53. The computer systemof claim 52, wherein decomposing the XML document into a hierarchy ofnodes comprises storing one or more second portions of the XML documentwhich are also a portion of the corresponding first portion of the XMLdocument, as nodes at a third-highest level of the hierarchy of nodes.54. The computer system of claim 53, wherein decomposing the XMLdocument into a hierarchy of nodes comprises storing one or moresubsequent portions of the XML document which are also portions of thecorresponding portion of the XML document directly above thecorresponding one or more subsequent portions of the XML document in thehierarchy of nodes, as a node at a level of the hierarchy immediatelybelow the corresponding one or more subsequent portions of the XMLdocument.
 55. The computer system of claim 48, wherein each node of thehierarchy of nodes contains a unique directory identifier.
 56. Thecomputer system of claim 48, wherein each node of the hierarchy of nodescontains a node element name.
 57. The computer system of claim 48,wherein each node of the hierarchy of nodes contains one or more nodeattributes.